Network Landscape from a Brownian Particle's Perspective 
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Given a complex biological or social network, how many clusters should it be decomposed into? 
We define the distance dij from node i to node j as the average number of steps a Brownian particle 
takes to reach j from i. Node j is a global attractor of i if di t j < di,k for any k of the graph; it 
is a local attractor of i, if j £ Ei (the set of nearest-neighbors of i) and di t j < di t i for any I 6 Ei. 
Based on the intuition that each node should have a high probability to be in the same community 
as its global (local) attractor on the global (local) scale, we present a simple method to uncover a 
network's community structure. This method is applied to several real networks and some discussion 
on its possible extensions is made. 
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A complex networked system, such as an organism's 
metabolic network and genetic interaction network, is 
composed of a large number of interacting agents. The 
complexity of such systems originates partly from the 
heterogeneity in their interaction patterns, aspects of 
which include the small-world I] and the scale-free prop- 
erties 00 observed in many social, biological, and tech- 
nological networks 0, El Given this high degree of 
complexity, it is necessary to divide a network into dif- 
ferent subgroups to facilitate the understanding of the 
relationships among different components 00. 

A complex network could be represented by a graph. 
Each component of the network is mapped to a ver- 
tex (node), and the interaction between two components 
is signified by an edge between the two corresponding 
nodes, whose weight is related to the interaction strength. 
The challenge is to dissect this graph based on its connec- 
tion pattern. We know that to partition a graph into two 
equally sized subgroups such that the number of edges 
in between reaches the absolute minimum is already a 
NP-complctc problem, a solution is not guaranteed to be 
found easily; however it is still a well-defined question. 
On the other hand, the question "How many subgroups 
should a graph be divided into and how?" is ill-posed, 
as we do not have an objective function to optimize; and 
we have to rely on heuristic reasoning to proceed. 

If we are interested in identifying just one community 
that is associated with a specified node, the maximum 
flow method turns out to be efficient. Recently, it 
is applied to identifying communities of Internet web- 
pages 10] . An community thus uncovered is usually very 
small; and for this method to work well one needs a priori 
knowledge of the network to select the source and sink 
nodes properly. Another ele gant method is based on the 
concept of edge betweenness |lj . The degree of between- 
ness of a edge is defined as the total number of shortest 
paths between pair of nodes which pass through it. By 
removing recursively the current edge with the highest 
degree of betweenness, one expects the connectivity of 
the network to decrease the most efficiently and mini- 
mal cutting operations is needed to separate the network 
into subgroups ■ This idea of Girvan and Newman 



could be readily extended to weighted graphs by assign- 
ing each edge a length equalling its reciprocal weight. 
Furthermore, in the sociology literature, there is a rel- 
atively long tradition in identifying communities based 
on the criteria of reachability and shortest distance (see, 
e.g., [13). 

In this paper, a new method of network community 
identification is described. It is based on the concept 
of network Brownian motion: If an intelligent Brown- 
ian particle lives in a given network for a long time, what 
might be its perspective of the network's landscape? We 
suggest that, without the need to removing edges from 
the network, the node-node distances "measured" by this 
Brownian particle can be used to construct the commu- 
nity structure and to identify the central node of each 
community. This idea is tested on several social and 
biological networks and satisfiable results are obtained. 
Several ways are discussed to extend and improve our 
method. 

Consider a connected network of N nodes and M 
edges. Its node set is denoted by V = {1, • • ■ , N} and 
its connection pattern is specified by the generalized ad- 
jacency matrix A. If there is no edge between node i 
and node j, A^j — 0; if there is an edge in between, 
Aij = Aji > and its value signifies the interaction 
strength (self-connection is allowed) . The set of nearest- 
neighbors of node i is denoted by Ei. A Brownian parti- 
cle keeps moving on the network, and at each time step 
it jumps from its present position (say i) to a nearest- 
neighboring position (j). When no additional knowledge 
about the network is known, it is natural to assume the 
following jumping probability Pij — Aij/Y2i=iAu (the 
corresponding matrix P is called the transfer matrix). 
One verifies that at time t ^> M the probability p(k) for 
the Brownian particle to be at any node k is nonvan- 
ishing and equals to J^i Aki/ J2 m n A m n, proportional to 
the total interaction capacity Am of node k. 

Define the node-node distance dij from i to j as the 
average number of steps needed for the Brownian particle 
to move from i through the the network to j. From some 
simple linear-algebra calculation 0] it is easy to see that 
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where I is the N x N identity matrix, and matrix B(j) 
equals to the transfer matrix P except that Bij(j) = 
for any / £ V. The distances from all the nodes in V to 
node j can thus be obtained by solving the linear alge- 
braic equation [I— B(j)]{dij, • ■ • , d,Nj} T = {1, • • • , 1} T . 
We are mainly interested in sparse networks with M = 
O(N); for such networks there exist very efficient algo- 
rithms 0, ^3 to calculate the root of this equation. If 
node j has the property that dij < di t k for any k £ V, 
then j is tagged as a global attractor of node i (i is closest 
to j in the sense of average distance). Similarly, if j £ E^ 
and 3 < d^i for any I £ Ei, then j is an /ocaZ attractor 
of i (z is closest to j among all its nearest-neighbors) . We 
notice that, in general the distance from i to j (dij) dif- 
fers from that from j to i (dj t i). Consequently, if j is an 
attractor of i, node i is not necessarily also an attractor 
of j. 

If a graph is divided into different subgroups, on the lo- 
cal scale we intuitively expect that each node i will have 
a high probability to be in the same subgroup as its lo- 
cal attractor j, since among all the nearest-neighboring 
nodes in Ei , node j has the shortest "distance" from node 
i. For simplicity let us just assume this probability to be 
unity (a possible improvement is discussed later). Thus, 
we can define a local- attractor-based community (or sim- 
ply a "L-community" ) as a set of nodes L = {ii, • ■ • , i m } 
such that (1) if node i £ L and node j is an local attrac- 
tor of i, then j £ L, (2) if i £ L and node k has i as its 
local attractor, then k £ L, and (3) any subset of L is not 
a L-community. Clearly, two L-communities L a and Lb 
are either identical (L a = Lb) or disjoint (L a fl Lb = 0). 
Based on each node's local attractor the graph could be 
decomposed into a set of L-communities. 

According to the same intuitive argument, on the 
global scale we expect that each node will have a high 
probability to be in the same community as its global 
attractor, and if assume this probability to be unity we 
can similarly construct the global- attractor-based commu- 
nities ( "G-communities" ) based on the global-attractor 
of each node. For small networks, we expect the L- 
and G-community structures to be identical; while for 
large networks, each G-community may contain several 
L-communities as its subgroups. A community could be 
characterized by its size iV c and an instability index I c . 
A node i in community C is referred to as unstable if 
its total direct interaction with nodes in any another 
community C", Y^keC ^ ifc > ^ s stronger than its total di- 
rect interaction with other nodes in its own community, 
J2k£C\i ^ik- Ic is the total number of such nodes in each 
community. We can also identify the center of a commu- 
nity (if it exists) as the node that is the global attractor 
of itself. 

Now we test the above-mentioned simple method on 



some well-documented networks whose community struc- 
tures are known. The first example is the social network 
recorded by Zachary 0] . This network contains 34 nodes 
and 77 weighted edges, and it was observed to sponta- 
neousl y fi ssion into two groups of size 16 and 18, respec- 
tively |l6j | (these two groups are marked by two colors in 
Fig.nj^.). The results of our method is shown in Fig.^A 
Community L\ contains 11 elements (node 13 is unstable 
and has stronger direct interaction with L^), L^ has 6 el- 
ements (node 9 has stronger direct interaction with L3), 
and I/3 has 17 elements. Nodes 1 (the manager), 3, and 
34 (the officer) are the corresponding centers. We find 
that for this network the G-communities coincide with 
the L-communities. 

As another example, the scientific collaboration net- 
work of Santa Fe Institute is considered. The gi- 
ant connected component contains 118 nodes and 200 
weighted edges, the weights are assigned according to 
the measure in |17| . The present method divides the net- 
work into six L-communities, see Fig. ^3. All the nodes in 
community L\ (size 14), L 2 (41), L 4 (8), L 5 (26), and L 6 
(17) are locally stable, and one node in L3 has stronger 
direct interaction with community Lq. Same as the above 
example, the G-community structure is also identical to 
the L-community structure. Girvan and Newman divided 
this network into four major groups by recursively re- 
moving edges of highest degree of betweenness 0: the 
largest of which was further divided into three subgroups 
and the second largest was divided into two subgroups. 
There are still some minor differences between the six 
subgroups obtained by the present method and those ob- 
tained in 0, which may be attributed to the fact that, 
in the treatment of [7J the network was regarded as un- 
weighted. 

The method is further tested on a relatively more com- 
plicated case, the foot-ball match network compiled by 
Girvan and Newman 0- It contains 115 nodes and 613 
unweighted edges. These 115 teams were distributed into 
12 conferences by the game organizers. Based on the con- 
nection pattern, the present method divides them into 15 
L-communities, of which 11 are locally stable: L2 (size 9), 
L 3 (13), L 4 (14), L 5 (10), L 6 (8), L 7 (6), L 8 (7), L 9 (6), 
L10 (4), Ln (6), and L13 (size 9). One element of L\ (size 
9) has stronger interaction with L±o, and one element of 
L12 (size 10) has stronger interaction with L3, and all the 
elements of L14 (size 2) and L15 (size 2) are locally unsta- 
ble. The G-communities of this network are also identical 
to the L-communities. In Fig. the community struc- 
ture of this network is shown, where nodes belonging to 
each identified community are located together, and the 
different colors encode the actual 12 conferences |7j. Fig- 
ure OP indicates that the predicted communities coincide 
very well with the actual communities. The community 
structure obtained by the present method is also in very 
good correspondence with that obtained by Girvan and 
Newman based on edge betweenness. 

The above-studied networks all have relatively small 
network sizes and the identified G-communities coincide 



3 



with the L-communities. Now we apply our method to 
the protein interaction network (yeast core 0, 0|) of 
baker's yeast. The giant connected component of this 
network contains 1471 proteins and 2770 edges (assumed 
to be unweighted, since the interaction strengths be- 
tween the proteins are generally undetermined). The 
present method dissect this giant component into 14 G- 
communities (Table. [IJ and into 69 L-communities (11 
of them contain one locally unstable node, 15 of them 
have 2-7 locally unstable nodes, all the others are stable). 
The relationship between the G- and L-communities is 
demonstrated in Fig^D, where proteins are grouped into 
L-communities and those of the same G-community have 
the same color. We see from Fig. that if two nodes 
are in the same L-community, they are very probable to 
be in the same G-community. The largest G-community 
(Gi) contains more than half of the proteins and is cen- 
tered around nucleoporin YMR047C, which, according to 
SWISS-PROT description |2(j, is "an essential compo- 
nent of nuclear pore complex" and "may be involved in 
both binding and translocation of the proteins during 
nucleocytoplasmic transport" . YMR047C interact directly 
only with 39 other proteins (it is even not the most con- 
nected node in the system), but associated with it is a 
group of 935 proteins as suggested by the present method. 
The protein interaction network may be evolved to facili- 
tate efficient protein transportation by protein-mediated 
indirect interactions. 

What will happen if the protein YMR047C is removed 
from the network? The resulting perturbed system has 
1463 nodes and 2729 edges, and we find that its L- 
community structure does not change much. Altogether 
72 L-communities are identified, and most of them con- 
tain more or less the same set of elements as in the un- 
perturbed network. However, there is a dramatic change 
in the G-community structure. There are now 21 G- 
communities (the largest of which has 574 proteins), 
while Gi of the original system breaks up into eight 
smaller G-communities. It was revealed that the most 
highly connected proteins in the cell are the most im- 
portant for its survival, and mutations in these proteins 
are usually lethal |2l|. Our work suggests that, these 
highly connected proteins are especially important be- 
cause they help integrating many small functional mod- 
ules (L-communities) into a larger unit (G-community), 
enabling the cell to perform concerted reactions in re- 
sponse to environment stimuli. 

In the above examples, the network studied are all from 
real-world. We have also tested the performance of our 
method to some artificial networks generated by com- 
puter. To compare with the result of Ref. Q, we gen- 
erated an ensemble of random graphs with 128 vertices. 
These vertices are divided into four groups of 32 vertices 
each. Each vertex has on average 16 edges, z out of which 
are to vertices of other groups, and the remaining are to 
vertices within its group; all these edges are drawn ran- 
domly and independently in all the other means. Using 
the method of Girvan and Newman, it was reported |7J 



that when z out < 6 all the vertices could be classified 
with high probability. Our present method in its sim- 
plest form could work perfectly only when z out < 2.5. 
In the artificial network, the vertices are identical with 
each other in the statistical sense and there is no correla- 
tion between the degrees of two neighboring edges. Our 
method seems not to be the best for such kind of random 
networks. 

In summary, we have suggested a simple way of group- 
ing a graph of nodes and edges into different subgraphs 
based on the node-node distance measured by a Brown- 
ian particle. The basic idea was applied to several real 
networked systems and very encouraging results were ob- 
tained. The concept of random walking was also used in 
some recent efforts to facilitate searching on networks 
(see, e.g., |I2,[23j), the present work may be the first at- 
tempt in applying it on identifying network community 
structure. Some possible extensions of our method are 
immediately conceivable: First, in the present work we 
have assumed that a node will be in the same commu- 
nity as its attractor with probability 1. Naturally, we can 
introduce a "inverse temperature" f3 and suppose that 
node i be in the same community as node j with prob- 
ability proportional to exp(— fidij). The present work 
discusses just the zero temperature limit. We believe 
that the communities identified at zero temperature will 
persist until the temperature is high enough. Second, 
we can construct a gross-grained network by regarding 
each L-community as a single node, and defining the dis- 
tance from one L-community to another as the average 
node-node distance between nodes in these two commu- 
nities. The present method can then be applied, and the 
relationship between different L-communities can be bet- 
ter understood. Third, for very large networks, it is im- 
practical to consider the whole network when calculating 
node-node distance. Actually this is not necessary, since 
the length of the shortest path between a given node and 
its attractor should be small. We can therefore focus on 
a localized region of the network to identify the attractor 
of a given node. 

Furthermore, based on the distance measure of the 
present paper, we can define a quantity called the dis- 
similarity index for any two nearest-neighboring nodes. 
Nearest- neighboring vertices of the same community tend 
to have small dissimilarity index, while those belonging 
to different communities tend to have high dissimilarity 
index. Extensions of the present work will be reported 
in a forthcoming paper p4j . 

An interesting task is to use extended versions of the 
present method to explore the landscape of the Internet's 
autonomous system |3| and that of the metabolic network 
of E. coli H|2|. 

I am grateful to M. Girvan and M. E. J. Newman for 
sharing data and to Professor R. Lipowsky for support. 
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TABLE I: G-communities of yeast's protein interaction net- 
work [Tgl ITg|| . N c is the community size, I c is the number of 
locally unstable nodes. 
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FIG. 1: (Color) Community structure of some model networks (the nodes of the same L-community are spatially grouped 
together), (a) The karate club network compiled by Zachary [lrj| (here nodes are colored according to their actual groupings); 
(b) the scientific collaboration network compiled by Girvan and Newman |7j]; (c) the foot-ball match network compiled by 
Girvan and Newman 7] (nodes are colored according to their actual groupings); and (d) the yeast protein interaction network 
[Tsl Il9j . here nodes of the same G-community are encoded with the same color (open circles denote nodes in Gi). 



